Introduction

Using the 2017-2018 NHANES data, which you can read more about in the tab below, our group hopes to explore potential relationships between participant mental health, physical activity habits, and demographic traits. There is a widespread assumption that increased physical activity has a positive correlation with improved mental health. Using statistical methods, we are intrigued to see if this is a valid assumption and what other drivers may alter mental health status.

Given this common inference, we hypothesize poor mental health will correlate with fewer minutes of vigorous activity, and more time spent sedentary.

These relationships are important to analyze as mental health concerns grow, especially in such difficult times. In 2017 and 2018, when these data were collected, nationwide stress was already mounting with increased natural disasters, growing political tensions, and other societal concerns. Since then, global stress levels have risen while the ability to exercise has decreased significantly in the midst of the pandemic. Perhaps, by analyzing these data from the pre-COVID era, we may recognize a correlation across physical activity, demographic group, and mental health that can be used to encourage change in people’s daily habits to improve mental health statuses holistically.

About the data

Original NHANES Data

The National Health and Nutrition Examination Survey (NHANES) is a program designed to assess the health and nutritional status of adults and children in the US. They survey examines a nationally representative sample of about 5,000 people each year. It combines interviews and physical examinations. The surveys are crafted to focus on various health topics and demographic groups.

For our project, we chose the 2017-2018 wave of data. These data are recent, and contain a full year’s worth of un-interrupted information. Datasets can be found covering demographics, dietary data, examination data, laboratory data, and questionnaire data. From the Questionnaire Data tab, we decided to pull the Mental Health - Depression Screener dataset, as well as the Physical Activity dataset. The Questionnaire data seemed to have the most broad and available data. From the Demographics Data tab we used the only available dataset.

The 2017-2018 Demographic dataset includes 46 variables. Of these, we plan to use “gender” and “age.” These data were collected at a screening. Gender is given as “male” or “female” in binary form where 1 represents male and 0 represents female. Age is given in years of the participant at the time of screening. Individuals 80 and over are categorized as 80+ years of age. The others are given as integers.

The 2017-2018 Physical Activity dataset contains 17 quantitative variables. Of these, we plan to use the variables “Minutes of Vigorous Recreational Activities,” and “Minutes of Sedentary Activity.” These are the variables PAD660, PAD680 from the dataset. We will use forcats to re-code each level of activity into “low”, “moderate”, and “intense” and will similarly recode time spent sedentary into “low,” “moderate,” and "intense.

The 2017-2018 Mental Health dataset has 11 variables, all answered categorically with multiple choice options. From these, we plan to use the following variables: “Feeling down, depressed, or hopeless,” “Trouble sleeping or sleeping too much,” Feeling bad about yourself," and “Trouble concentrating on things.” These variables are all within the context of occurring within 2 weeks prior to taking the survey. In the dataset they are denoted by DPQ020, DPQ030, DPQ060, DPQ070. A scale is provided for each with 0 representing “Not at all,” 1 representing “Several days,” 2 representing “More than half the days,” and 3 representing “Nearly every day.” 7 and 9 represented “Refuse to answer” and “Don’t know,” respectively. These responses were removed from our final dataset.

Data cleaning and wrangling methodology

To clean these data, we first selected the aforementioned variables of interest and removed the following entries from each data set: “Refuse to answer,” “Don’t know,” “NA”, and “missing.”

Using dplyr and full_join(), we first joined two datasets by SEQN, a common variable that identified each participant across the datasets, and created a new dataset. To that, we used full_join() again with SEQN as the common variable to combine all 3 datasets into one. From here, we then removed any resulting “NA” entries. This resulted in a new set of 1,271 observations.

# Selecting datasets
phys <- read.xport("PAQ_J.XPT") #Physical Activity
mental <- read.xport("DPQ_J.XPT") # Mental Health
demo <- read.xport("DEMO_J (1).XPT") # Demographic Data

# select variables of interest. remove instances of "NA", dont know", "refuse to answer", and "missing.
phys1 <- phys %>% select(PAD660, PAD680, SEQN) %>% na.omit() %>% filter(!PAD660 %in% c("7777", "9999", ".") & !PAD680 %in% c("7777", "9999", ".") )
mental1 <- mental %>% select(DPQ020, DPQ030, DPQ060, DPQ070, SEQN) %>% na.omit() %>% filter(!DPQ020 %in% c("7", "9", "."), !DPQ030 %in% c("7", "9", "."), !DPQ060 %in% c("7", "9", "."), !DPQ070 %in% c("7", "9", "."))
demo1 <- demo %>% select(RIAGENDR, RIDAGEYR, RIDRETH3, SEQN) %>% na.omit() # selecting gender and age

# Joining datasets and dropping resulting NAs using dplyr
dat1 <- full_join(phys1, mental1)
dat2 <- full_join(dat1, demo1)
dat2 <- dat2 %>% na.omit()

From here, we re-named the variables to make them easier to keep up with throughout the project.

We then used as.factor() to turn some variables into factor variables for easier manipulation later on.

We then created new variables in the dataset for all mental health variables as well as gender in order to re-code the responses from a numerical scale to an easier-to-understand scale. We decided to keep these as separate variables within the dataset, though they may seem to be duplicates, (depr2, sleep2, feelBad2, concen2, and gender2) in case using the numerical scales proved to be more helpful at another point during the project.

Using the physical activity variables, we similarly created 2 more variables to include in the dataset for which minsPA and minsSed were grouped into 3 levels. We used cut() in order to take a numerical rnage and split it into 3 levels: “low”, “moderate”, and “high.” This will be helpful for visualizing clear mental health differences between each group.

We then turned our new dataframe into a csv file to share easily between group members.

# Data cleaning

# renaming variables in dataset so they are easier to understand------------------------------------
dat2 <- dat2 %>% 
  rename(
    minsPA = PAD660,
    minsSed = PAD680,
    IDNum = SEQN,
    depr = DPQ020,
    sleep = DPQ030,
    feelBad = DPQ060,
    concen = DPQ070,
    gender = RIAGENDR,
    age = RIDAGEYR,
    race = RIDRETH3
    )

# making ints into categorical variables as needed - depression screening questionnaire and demographic data
dat2$depr <- as.factor(dat2$depr)
dat2$sleep <- as.factor(dat2$sleep)
dat2$feelBad <- as.factor(dat2$feelBad)
dat2$concen <- as.factor(dat2$concen)
dat2$gender <- as.factor(dat2$gender)



# re-coding the responses using forcats ---------------------------------------------------------------------------

dat2 <- dat2 %>% 
  mutate(depr2 = fct_recode(depr, 
                              "Not at all" = "0", 
                              "Several Days" = "1", 
                              "More than half the days" = "2", 
                              "Nearly every day" = "3"),
         sleep2 = fct_recode(sleep, 
                              "Not at all" = "0", 
                              "Several Days" = "1", 
                              "More than half the days" = "2", 
                              "Nearly every day" = "3"),
         feelBad2 = fct_recode(feelBad, 
                              "Not at all" = "0", 
                              "Several Days" = "1", 
                              "More than half the days" = "2", 
                              "Nearly every day" = "3"),
         concen2 = fct_recode(concen, 
                              "Not at all" = "0", 
                              "Several Days" = "1", 
                              "More than half the days" = "2", 
                              "Nearly every day" = "3"),
         gender2 = fct_recode(gender,
                              "Male" = "1",
                              "Female" = "0")
  )

# Making Mental Health vars binary for regression analysis and saving as var3
dat2 <- dat2 %>% 
  mutate(depr3 = fct_recode(depr, 
                              "0" = "0", 
                              "1"= "1", 
                              "1" = "2", 
                              "1" = "3"),
         sleep3 = fct_recode(sleep, 
                              "0" = "0", 
                              "1"= "1", 
                              "1" = "2", 
                              "1" = "3"),
         feelBad3 = fct_recode(feelBad, 
                            "0" = "0", 
                              "1"= "1", 
                              "1" = "2", 
                              "1" = "3"),
         concen3 = fct_recode(concen, 
                           "0" = "0", 
                              "1"= "1", 
                              "1" = "2", 
                              "1" = "3"))
dat2$depr3 <- as.factor(dat2$depr3)
dat2$sleep3 <- as.factor(dat2$sleep3)
dat2$feelBad3 <- as.factor(dat2$feelBad3)
dat2$concen3 <- as.factor(dat2$concen3)

# categorize phys activity data----------------------------------------------------------------------------------

dat2 <- data.frame(dat2)
dat2$minsPAlevel = cut(dat2$minsPA, c(0,75,250, 480), labels = c("low", "moderate", "intense"))
dat2$minsSedlevel = cut(dat2$minsSed, c(0,380,760, 1140), labels = c("low", "moderate", "intense"))
dat2$agelevel = cut(dat2$age, c(17,35,60,80), labels = c("young", "middle aged", "old"))

# turn into csv file for other students to use
write.csv(dat2,"~/Desktop/Senior Yr/QTM 151\\projectdata.csv", row.names = FALSE)

Visualizing the data

Here we will take a look at our data categorically. Navigate using the tabs below.

Mental Health and Activity

#First three graph, of minutes of activity versus sleep2, depr2, feelBad2

graph1 <- ggplot(dat2, aes(x=depr2, group= minsSedlevel)) + 
          geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
    geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5)+
          facet_wrap(~minsSedlevel)+
          labs(x="Level of depression", y="Proportion", title="Figure 1: Depression levels seperated by level of sedentary activity ", fill= "depr2") +
          scale_fill_brewer(name = 'Levels', breaks = 1:4, 
          labels = levels(dat2$depr2), palette = 'Set2')+
          scale_y_continuous(labels = scales::percent_format())+
          theme_minimal()+
          theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12))
graph1

data1 <-ggplot_build(graph1)
head(data1$data)
## [[1]]
##       fill          y count       prop x  stat group PANEL ymin       ymax
## 1  #66C2A5 0.82113821   707 0.82113821 1 count     1     1    0 0.82113821
## 2  #FC8D62 0.12427410   107 0.12427410 2 count     1     1    0 0.12427410
## 3  #8DA0CB 0.03135889    27 0.03135889 3 count     1     1    0 0.03135889
## 4  #E78AC3 0.02322880    20 0.02322880 4 count     1     1    0 0.02322880
## 5  #66C2A5 0.81250000   312 0.81250000 1 count     2     2    0 0.81250000
## 6  #FC8D62 0.15885417    61 0.15885417 2 count     2     2    0 0.15885417
## 7  #8DA0CB 0.01562500     6 0.01562500 3 count     2     2    0 0.01562500
## 8  #E78AC3 0.01302083     5 0.01302083 4 count     2     2    0 0.01302083
## 9  #66C2A5 0.88461538    23 0.88461538 1 count     3     3    0 0.88461538
## 10 #FC8D62 0.03846154     1 0.03846154 2 count     3     3    0 0.03846154
## 11 #8DA0CB 0.03846154     1 0.03846154 3 count     3     3    0 0.03846154
## 12 #E78AC3 0.03846154     1 0.03846154 4 count     3     3    0 0.03846154
##    xmin xmax colour size linetype alpha
## 1  0.55 1.45     NA  0.5        1    NA
## 2  1.55 2.45     NA  0.5        1    NA
## 3  2.55 3.45     NA  0.5        1    NA
## 4  3.55 4.45     NA  0.5        1    NA
## 5  0.55 1.45     NA  0.5        1    NA
## 6  1.55 2.45     NA  0.5        1    NA
## 7  2.55 3.45     NA  0.5        1    NA
## 8  3.55 4.45     NA  0.5        1    NA
## 9  0.55 1.45     NA  0.5        1    NA
## 10 1.55 2.45     NA  0.5        1    NA
## 11 2.55 3.45     NA  0.5        1    NA
## 12 3.55 4.45     NA  0.5        1    NA
## 
## [[2]]
##             y label count       prop x width group PANEL colour size angle
## 1  0.82113821 82.1%   707 0.82113821 1   0.9     1     1  black 3.88     0
## 2  0.12427410 12.4%   107 0.12427410 2   0.9     1     1  black 3.88     0
## 3  0.03135889  3.1%    27 0.03135889 3   0.9     1     1  black 3.88     0
## 4  0.02322880  2.3%    20 0.02322880 4   0.9     1     1  black 3.88     0
## 5  0.81250000 81.2%   312 0.81250000 1   0.9     2     2  black 3.88     0
## 6  0.15885417 15.9%    61 0.15885417 2   0.9     2     2  black 3.88     0
## 7  0.01562500  1.6%     6 0.01562500 3   0.9     2     2  black 3.88     0
## 8  0.01302083  1.3%     5 0.01302083 4   0.9     2     2  black 3.88     0
## 9  0.88461538 88.5%    23 0.88461538 1   0.9     3     3  black 3.88     0
## 10 0.03846154  3.8%     1 0.03846154 2   0.9     3     3  black 3.88     0
## 11 0.03846154  3.8%     1 0.03846154 3   0.9     3     3  black 3.88     0
## 12 0.03846154  3.8%     1 0.03846154 4   0.9     3     3  black 3.88     0
##    hjust vjust alpha family fontface lineheight
## 1    0.5  -0.5    NA               1        1.2
## 2    0.5  -0.5    NA               1        1.2
## 3    0.5  -0.5    NA               1        1.2
## 4    0.5  -0.5    NA               1        1.2
## 5    0.5  -0.5    NA               1        1.2
## 6    0.5  -0.5    NA               1        1.2
## 7    0.5  -0.5    NA               1        1.2
## 8    0.5  -0.5    NA               1        1.2
## 9    0.5  -0.5    NA               1        1.2
## 10   0.5  -0.5    NA               1        1.2
## 11   0.5  -0.5    NA               1        1.2
## 12   0.5  -0.5    NA               1        1.2
graph2 <- ggplot(dat2, aes(x=depr2, group= minsPAlevel)) + 
          geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
    geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5)+
          facet_wrap(~minsPAlevel)+
          labs(x="Level of depression", y="Proportion", title="Figure 1: Depression levels seperated by level of sedentary activity ", fill= "depr2") +
          scale_fill_brewer(name = 'Levels', breaks = 1:4, 
          labels = levels(dat2$depr2), palette = 'Set2')+
          scale_y_continuous(labels = scales::percent_format())+
          theme_minimal()+
          theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12)) 
graph2

data2 <-ggplot_build(graph2)
head(data2$data)
## [[1]]
##       fill          y count       prop x  stat group PANEL ymin       ymax
## 1  #66C2A5 0.82692308   688 0.82692308 1 count     1     1    0 0.82692308
## 2  #FC8D62 0.12740385   106 0.12740385 2 count     1     1    0 0.12740385
## 3  #8DA0CB 0.02884615    24 0.02884615 3 count     1     1    0 0.02884615
## 4  #E78AC3 0.01682692    14 0.01682692 4 count     1     1    0 0.01682692
## 5  #66C2A5 0.80510441   347 0.80510441 1 count     2     2    0 0.80510441
## 6  #FC8D62 0.14385151    62 0.14385151 2 count     2     2    0 0.14385151
## 7  #8DA0CB 0.02320186    10 0.02320186 3 count     2     2    0 0.02320186
## 8  #E78AC3 0.02784223    12 0.02784223 4 count     2     2    0 0.02784223
## 9  #66C2A5 0.87500000     7 0.87500000 1 count     3     3    0 0.87500000
## 10 #FC8D62 0.12500000     1 0.12500000 2 count     3     3    0 0.12500000
##    xmin xmax colour size linetype alpha
## 1  0.55 1.45     NA  0.5        1    NA
## 2  1.55 2.45     NA  0.5        1    NA
## 3  2.55 3.45     NA  0.5        1    NA
## 4  3.55 4.45     NA  0.5        1    NA
## 5  0.55 1.45     NA  0.5        1    NA
## 6  1.55 2.45     NA  0.5        1    NA
## 7  2.55 3.45     NA  0.5        1    NA
## 8  3.55 4.45     NA  0.5        1    NA
## 9  0.55 1.45     NA  0.5        1    NA
## 10 1.55 2.45     NA  0.5        1    NA
## 
## [[2]]
##             y label count       prop x width group PANEL colour size angle
## 1  0.82692308 82.7%   688 0.82692308 1   0.9     1     1  black 3.88     0
## 2  0.12740385 12.7%   106 0.12740385 2   0.9     1     1  black 3.88     0
## 3  0.02884615  2.9%    24 0.02884615 3   0.9     1     1  black 3.88     0
## 4  0.01682692  1.7%    14 0.01682692 4   0.9     1     1  black 3.88     0
## 5  0.80510441 80.5%   347 0.80510441 1   0.9     2     2  black 3.88     0
## 6  0.14385151 14.4%    62 0.14385151 2   0.9     2     2  black 3.88     0
## 7  0.02320186  2.3%    10 0.02320186 3   0.9     2     2  black 3.88     0
## 8  0.02784223  2.8%    12 0.02784223 4   0.9     2     2  black 3.88     0
## 9  0.87500000 87.5%     7 0.87500000 1   0.9     3     3  black 3.88     0
## 10 0.12500000 12.5%     1 0.12500000 2   0.9     3     3  black 3.88     0
##    hjust vjust alpha family fontface lineheight
## 1    0.5  -0.5    NA               1        1.2
## 2    0.5  -0.5    NA               1        1.2
## 3    0.5  -0.5    NA               1        1.2
## 4    0.5  -0.5    NA               1        1.2
## 5    0.5  -0.5    NA               1        1.2
## 6    0.5  -0.5    NA               1        1.2
## 7    0.5  -0.5    NA               1        1.2
## 8    0.5  -0.5    NA               1        1.2
## 9    0.5  -0.5    NA               1        1.2
## 10   0.5  -0.5    NA               1        1.2


Figure 1 seems as though it does not corroborate our hypothesis. From Figure 1, people with intense levels of sedentary activity appear to have a similar likelihood to feel depressed as those with little sedentary activity. In fact, 88.46% of intense sedentary people felt no depression compared to 82.11% and 81.25% for the other levels. However, those with intense levels of physical activity seem to generally have lower levels of depression, (87.5% no depression versus 80.51% and 82.69% for moderate and low) which could imply the relationship is only one way, such that higher levels of physical activity lower the risk of depression, but the level of sedentary activity does not factor in as much.

graph3 <- ggplot(dat2, aes(x=sleep2, group= minsSedlevel)) + 
          geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
  geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5) +
          facet_wrap(~minsSedlevel) +
          labs(x="Quality of Sleep",y="Proportion", title="Figure 3: Quality of sleep seperated by level of sedentary activity ", fill= "sleep2") +
          scale_fill_brewer(name = 'Levels', breaks = 1:4, 
          labels = levels(dat2$sleep2), palette = 'Set2') +
          scale_y_continuous(labels = scales::percent_format()) +
          theme_minimal() +
          theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12)) 
graph3

data3 <-ggplot_build(graph3)
head(data3$data)
## [[1]]
##       fill          y count       prop x  stat group PANEL ymin       ymax
## 1  #66C2A5 0.66202091   570 0.66202091 1 count     1     1    0 0.66202091
## 2  #FC8D62 0.21254355   183 0.21254355 2 count     1     1    0 0.21254355
## 3  #8DA0CB 0.07200929    62 0.07200929 3 count     1     1    0 0.07200929
## 4  #E78AC3 0.05342625    46 0.05342625 4 count     1     1    0 0.05342625
## 5  #66C2A5 0.64583333   248 0.64583333 1 count     2     2    0 0.64583333
## 6  #FC8D62 0.23958333    92 0.23958333 2 count     2     2    0 0.23958333
## 7  #8DA0CB 0.05989583    23 0.05989583 3 count     2     2    0 0.05989583
## 8  #E78AC3 0.05468750    21 0.05468750 4 count     2     2    0 0.05468750
## 9  #66C2A5 0.50000000    13 0.50000000 1 count     3     3    0 0.50000000
## 10 #FC8D62 0.34615385     9 0.34615385 2 count     3     3    0 0.34615385
## 11 #8DA0CB 0.11538462     3 0.11538462 3 count     3     3    0 0.11538462
## 12 #E78AC3 0.03846154     1 0.03846154 4 count     3     3    0 0.03846154
##    xmin xmax colour size linetype alpha
## 1  0.55 1.45     NA  0.5        1    NA
## 2  1.55 2.45     NA  0.5        1    NA
## 3  2.55 3.45     NA  0.5        1    NA
## 4  3.55 4.45     NA  0.5        1    NA
## 5  0.55 1.45     NA  0.5        1    NA
## 6  1.55 2.45     NA  0.5        1    NA
## 7  2.55 3.45     NA  0.5        1    NA
## 8  3.55 4.45     NA  0.5        1    NA
## 9  0.55 1.45     NA  0.5        1    NA
## 10 1.55 2.45     NA  0.5        1    NA
## 11 2.55 3.45     NA  0.5        1    NA
## 12 3.55 4.45     NA  0.5        1    NA
## 
## [[2]]
##             y label count       prop x width group PANEL colour size angle
## 1  0.66202091 66.2%   570 0.66202091 1   0.9     1     1  black 3.88     0
## 2  0.21254355 21.3%   183 0.21254355 2   0.9     1     1  black 3.88     0
## 3  0.07200929  7.2%    62 0.07200929 3   0.9     1     1  black 3.88     0
## 4  0.05342625  5.3%    46 0.05342625 4   0.9     1     1  black 3.88     0
## 5  0.64583333 64.6%   248 0.64583333 1   0.9     2     2  black 3.88     0
## 6  0.23958333 24.0%    92 0.23958333 2   0.9     2     2  black 3.88     0
## 7  0.05989583  6.0%    23 0.05989583 3   0.9     2     2  black 3.88     0
## 8  0.05468750  5.5%    21 0.05468750 4   0.9     2     2  black 3.88     0
## 9  0.50000000 50.0%    13 0.50000000 1   0.9     3     3  black 3.88     0
## 10 0.34615385 34.6%     9 0.34615385 2   0.9     3     3  black 3.88     0
## 11 0.11538462 11.5%     3 0.11538462 3   0.9     3     3  black 3.88     0
## 12 0.03846154  3.8%     1 0.03846154 4   0.9     3     3  black 3.88     0
##    hjust vjust alpha family fontface lineheight
## 1    0.5  -0.5    NA               1        1.2
## 2    0.5  -0.5    NA               1        1.2
## 3    0.5  -0.5    NA               1        1.2
## 4    0.5  -0.5    NA               1        1.2
## 5    0.5  -0.5    NA               1        1.2
## 6    0.5  -0.5    NA               1        1.2
## 7    0.5  -0.5    NA               1        1.2
## 8    0.5  -0.5    NA               1        1.2
## 9    0.5  -0.5    NA               1        1.2
## 10   0.5  -0.5    NA               1        1.2
## 11   0.5  -0.5    NA               1        1.2
## 12   0.5  -0.5    NA               1        1.2
graph4 <- ggplot(dat2, aes(x=sleep2, group= minsPAlevel)) + 
          geom_bar(aes(y = ..prop.., fill = factor(..x..), stat = "count")) +
  geom_text(aes(label = scales::percent(..prop..), y= ..prop.. ), stat= "count", vjust= -.5) +
          facet_wrap(~minsPAlevel) +
          labs(x="Quality of Sleep",y="Proportion", title="Figure 4: Quality of sleep seperated by level of physical activity", fill= "sleep2") +
          scale_fill_brewer(name = 'Levels', breaks = 1:4, 
          labels = levels(dat2$sleep2), palette = 'Set2') +
          scale_y_continuous(labels = scales::percent_format()) +
          theme_minimal() +
          theme(axis.text.x = element_blank(), plot.title = element_text(vjust = 1, size=12)) 
graph4

data4 <-ggplot_build(graph4)
head(data4$data)
## [[1]]
##       fill          y count       prop x  stat group PANEL ymin       ymax
## 1  #66C2A5 0.65625000   546 0.65625000 1 count     1     1    0 0.65625000
## 2  #FC8D62 0.22475962   187 0.22475962 2 count     1     1    0 0.22475962
## 3  #8DA0CB 0.06490385    54 0.06490385 3 count     1     1    0 0.06490385
## 4  #E78AC3 0.05408654    45 0.05408654 4 count     1     1    0 0.05408654
## 5  #66C2A5 0.64965197   280 0.64965197 1 count     2     2    0 0.64965197
## 6  #FC8D62 0.22041763    95 0.22041763 2 count     2     2    0 0.22041763
## 7  #8DA0CB 0.07656613    33 0.07656613 3 count     2     2    0 0.07656613
## 8  #E78AC3 0.05336427    23 0.05336427 4 count     2     2    0 0.05336427
## 9  #66C2A5 0.62500000     5 0.62500000 1 count     3     3    0 0.62500000
## 10 #FC8D62 0.25000000     2 0.25000000 2 count     3     3    0 0.25000000
## 11 #8DA0CB 0.12500000     1 0.12500000 3 count     3     3    0 0.12500000
##    xmin xmax colour size linetype alpha
## 1  0.55 1.45     NA  0.5        1    NA
## 2  1.55 2.45     NA  0.5        1    NA
## 3  2.55 3.45     NA  0.5        1    NA
## 4  3.55 4.45     NA  0.5        1    NA
## 5  0.55 1.45     NA  0.5        1    NA
## 6  1.55 2.45     NA  0.5        1    NA
## 7  2.55 3.45     NA  0.5        1    NA
## 8  3.55 4.45     NA  0.5        1    NA
## 9  0.55 1.45     NA  0.5        1    NA
## 10 1.55 2.45     NA  0.5        1    NA
## 11 2.55 3.45     NA  0.5        1    NA
## 
## [[2]]
##             y label count       prop x width group PANEL colour size angle
## 1  0.65625000 65.6%   546 0.65625000 1   0.9     1     1  black 3.88     0
## 2  0.22475962 22.5%   187 0.22475962 2   0.9     1     1  black 3.88     0
## 3  0.06490385  6.5%    54 0.06490385 3   0.9     1     1  black 3.88     0
## 4  0.05408654  5.4%    45 0.05408654 4   0.9     1     1  black 3.88     0
## 5  0.64965197 65.0%   280 0.64965197 1   0.9     2     2  black 3.88     0
## 6  0.22041763 22.0%    95 0.22041763 2   0.9     2     2  black 3.88     0
## 7  0.07656613  7.7%    33 0.07656613 3   0.9     2     2  black 3.88     0
## 8  0.05336427  5.3%    23 0.05336427 4   0.9     2     2  black 3.88     0
## 9  0.62500000 62.5%     5 0.62500000 1   0.9     3     3  black 3.88     0
## 10 0.25000000 25.0%     2 0.25000000 2   0.9     3     3  black 3.88     0
## 11 0.12500000 12.5%     1 0.12500000 3   0.9     3     3  black 3.88     0
##    hjust vjust alpha family fontface lineheight
## 1    0.5  -0.5    NA               1        1.2
## 2    0.5  -0.5    NA               1        1.2
## 3    0.5  -0.5    NA               1        1.2
## 4    0.5  -0.5    NA               1        1.2
## 5    0.5  -0.5    NA               1        1.2
## 6    0.5  -0.5    NA               1        1.2
## 7    0.5  -0.5    NA               1        1.2
## 8    0.5  -0.5    NA               1        1.2
## 9    0.5  -0.5    NA               1        1.2
## 10   0.5  -0.5    NA               1        1.2
## 11   0.5  -0.5    NA               1        1.2


The graphs for sleep have a similar story as the graphs for feelings of depression. In Figure 3, people with intense sedentary activity levels appear to get a similar quality of sleep as those with low sedentary activity. Figure 4 shows physical activity has some effect on quality of sleep but it appears to be statistically insignificant. Those with intense physical activity almost have an equal likelihood of falling into each category of quality of sleep. However, the number of people who report intense physical activity and their sleep quality is so minimal that it is difficult to tell.

Gender and Mental Health

#Original graphs factored by Sex
graph5 <- ggplot(data=dat2, aes(x=depr2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 5: Frequency of feeling depressed separated \n by level of sedentary activity factored by sex")+ theme_minimal()+
  theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph5

graph6 <-ggplot(data=dat2, aes(x=depr2, fill = gender)) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 6: Frequency of feeling depressed separated \n by level of physical activity factored by sex")+
   theme_minimal()+
  theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph6


These charts are very interesting. While they seem to support the conclusions drawn above, it seems as though the effect may be higher for females than males. In Figure 5, there is a large gender difference for those with low and moderate sedentary activity. However, participants with intense physical activity have similar levels of depression regardless of gender. In Figure 6, Females with low levels of physical activity have a higher rate of depression than males. Still, that relationship switches when looking at moderate and intense levels of physical activity.

graph7 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 7: Quality of sleep separated \n by level of sedentary activity factored by sex")+
   theme_minimal()+
  theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph7

graph8 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(gender))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 8: Quality of sleep separated \n by level of physical activity factored by sex ")+
   theme_minimal()+
  theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph8


In Figure 7, the gender difference is not as varied as it was in Figure 5. The more sedentary activity, the less of a role gender plays in the quality of sleep that participants get. The most variation exists among those with low sedentary activity where females experience better quality sleep than males. In Figure 8, the only noticeable pattern is that participants with moderate and intense physical activity.

Age and Mental Health

#Original Graphs factored by age
graph9 <- ggplot(data=dat2, aes(x=depr2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 9: Frequency of feeling depressed separated \n by level of sedentary activity factored by age")+
   theme_minimal()+
 theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph9

graph10 <-ggplot(data=dat2, aes(x=depr2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Frequency of feeling depressed", y="Count", title="Figure 10: Frequency of feeling depressed seperated \n by level of physical activity factored by age")+
   theme_minimal()+
 theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph10


In both Figure 9 and Figure 10, age does not seem to play a role in the relationship between feelings of depression and levels of sedentary and physical activity. The only interesting relationship may exist when analyzing how age plays a role among those with intense sedentary activity however, there is not enough data to illustrate the relationship. Essentially, not enough participants reported intense sedentary behavior to make a proper conclusion.

graph11 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsSedlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 11: Quality of sleep separated \n by level of sedentary activity factored by age")+
   theme_minimal()+
  theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph11

graph12 <- ggplot(data=dat2, aes(x=sleep2, fill = factor(agelevel))) + geom_bar(position="dodge") + facet_wrap(~minsPAlevel)+labs(x="Quality of Sleep", y="Count", title="Figure 12: Quality of sleep separated \n by level of physical activity factored by age")+
   theme_minimal()+
 theme(axis.text.x = element_text(angle = 90), plot.title = element_text(vjust = 1, size=12), axis.title.x = element_text(vjust = -2))
graph12


Figure 11 is incredibly similar to Figure 9 in terms of the pattern among age groups depicted. As seen in Figure 9, Figure 11 also shows that age has little to no role in the relationship between sedentary activity and quality of sleep. In both Figure 11 and Figure 12, no statistically significant conclusion can be made from the group of participants who reported intense physical and sedentary activity when comparing them by age. In Figure 12, for both young and middle-aged participants, those who report moderate physical activity also report better sleep quality than those with low physical activity. However, for those in the oldest age group, there seems to be less of a difference.

Physical vs. Sedentary

phys1$PAD660 <- as.numeric(phys1$PAD660)
phys1$PAD680 <- as.numeric(phys1$PAD680)
graph13 <- ggplot(phys1, aes(x=PAD660, y=PAD680))+
  geom_point(alpha=0.8)+
  geom_smooth(method= "lm", se= FALSE)+
  xlab("Minutes of Vigorous Phys Activity Each Day")+
  ylab("Minutes Spent Sedentary Each Day")+
  ggtitle("Figure 13: Time Spent Active Vs. Time Spent Sedentary")+
  theme_minimal()
ggplotly(graph13)


Figure 13 shows the relationship between the amount of time people spend being active versus the amount of time they spend sedentary. As assumed, the more physical activity people perform, the less sedentary they are but not by a substantial amount.

Empirical Models

Model 1 : A simple linear regression on time spent sedentary and time spent doing physical activity

\[minsPA= \beta_0 + \beta_1 minsSed\] Model 2: A simple linear regression of gender on time spent doing physical activity \[minsPA= \beta_0 + \beta_1 gender\] Model 3: A multi-linear regression of mental health variables (in binary form. 0 for “Not at all”, 1 for anything else) on Physical Activity \[minsPA= \beta_0 + \beta_1 depr + \beta_2 sleep + \beta_3 feelBad + \beta_4 concen\] Model 4: A multi-linear regression of mental health variables (in binary form. 0 for “Not at all”, 1 for anything else) on time spent sedentary \[minsSed= \beta_0 + \beta_1 depr + \beta_2 sleep + \beta_3 feelBad + \beta_4 concen\]

Results

# Running a regression between mins spent doing vigorous physical activity and mins spent sedentary
dat2$minsPA <- as.numeric(dat2$minsPA)
dat2$minsSed <- as.numeric(dat2$minsSed)
m1 <- lm(minsPA~minsSed, data=dat2)

# Simple linear regression showing gender on minsPA and minsSed
dat2$minsPA <- as.numeric(dat2$minsPA)
m2 <- lm(minsPA ~ gender, data = dat2)

#MLR of mental health variables on phys activity 
m3 <- lm(minsPA~depr3+sleep3+feelBad3+concen3, data=dat2)
m4 <- lm(minsSed~depr3+sleep3+feelBad3+concen3, data=dat2)


stargazer(list(m1, m2), type = 'html', title= "Simple Linear Regressions on Physical Acivity", align=TRUE,
          covariate.labels = c("Mins Sedentary Activity", "Gender"),
          dep.var.labels   = "Mins Physical Activity",
          column.labels = c("Model 1", "Model 2"))
Simple Linear Regressions on Physical Acivity
Dependent variable:
Mins Physical Activity
Model 1 Model 2
(1) (2)
Mins Sedentary Activity -0.018**
(0.008)
Gender -16.356***
(2.966)
Constant 82.352*** 83.379***
(2.965) (1.928)
Observations 1,271 1,271
R2 0.004 0.023
Adjusted R2 0.003 0.023
Residual Std. Error (df = 1269) 52.737 52.224
F Statistic (df = 1; 1269) 5.244** 30.421***
Note: p<0.1; p<0.05; p<0.01
stargazer(list(m3, m4),  type = 'html', title= "Multiple Linear Regressions on Physical Acivity With Depression Variables", align=TRUE,
          covariate.labels = c("Felt Depressed", "Poor/Too Much Sleep", "Felt Bad About Oneself","Trouble Concentrating" ),
          dep.var.labels   = c("Mins Physical Activity","Mins Sedentary"),
          column.labels = c("Model 3", "Model 4"))
Multiple Linear Regressions on Physical Acivity With Depression Variables
Dependent variable:
Mins Physical Activity Mins Sedentary
Model 3 Model 4
(1) (2)
Felt Depressed 3.922 -33.431*
(4.750) (17.317)
Poor/Too Much Sleep 0.109 15.810
(3.362) (12.257)
Felt Bad About Oneself -1.711 30.534
(5.407) (19.712)
Trouble Concentrating -6.141 17.497
(4.789) (17.460)
Constant 76.765*** 329.493***
(1.883) (6.867)
Observations 1,271 1,271
R2 0.002 0.006
Adjusted R2 -0.001 0.002
Residual Std. Error (df = 1266) 52.864 192.728
F Statistic (df = 4; 1266) 0.529 1.762
Note: p<0.1; p<0.05; p<0.01


Model 1 implies that a one-minute increase in the time spent sedentary correlates to a decrease in the time spent doing vigorous physical activity by 0.018 minutes. This is statistically significant at the 0.05 level. As for Model 2, which runs a regression on gender (a binary variable where 0 = female and 1 = male), a unit increase in gender (0 to 1) is correlated with a 16.356 unit decrease in minutes spent doing vigorous physical activity. This suggests being a female alone decreases time spent physically active by 16.356 minutes. This is statistically significant at the 0.01 level.

For our multi-linear regression model on Physical Activity, Model 3, we observe the following: A one-unit increase in reported depression (from “not at all” to any of the other included options) will have a 3.9 unit increase on time spent doing physical activity, trouble sleeping or sleeping too much will have 0.109 unit increase on time spent doing vigorous physical activity, having felt bad about oneself at any time over the 2 weeks prior to the screening resulted in a 1.711 unit decrease in time spent doing physical activity, and having had trouble concentrating at any point decreased physical activity by 6.141 units. However, none of these results are statistically significant at any level and therefore are inconclusive.

For our multi-linear regression model on Sedentary Activity, Model 4, we observe the following: A one-unit increase in reported depression (from “not at all” to any of the other included options) will have a 33.431 unit decrease on time spent sedentary and is statistically significant at the 0.1 level. Trouble sleeping or sleeping too much will have 15.810 unit increase on time spent sedentary and is not statistically significant at any level. Having felt bad about oneself at any time over the 2 weeks prior to the screening resulted in a 30.534 unit increase in time spent sedentary and is not statistically significant. Having had trouble concentrating at any point increased time spent sedentary by 17.497 units and was not statistically significant at any level.


Conclusions

The general findings from visualizing the data are as follows:

  • A slight relationship between physical activity and feelings of depression was found primarily among those who reported intense physical activity.
  • The amount of sedentary activity an individual reported did not affect mental health.
  • The amount of sedentary and physical activity reported did not affect on the quality of sleep.
  • Out of gender, age, and sex the only variable that played some role on the relationship between activity and both depression and quality of sleep was gender. Specifically, females with lower levels of physical activity had much higher rates of depression than males who reported the same amount of physical activity.

From our regressions, we conclude minutes spent sedentary each day and minutes spent doing vigorous activity each day were negatively related and statistically significant. This is expected as it is logical that more active people will spend less time sitting each day.

From our multi-linear regression models, we only find one statistically significant beta coefficient, which implies having felt depressed at any point within the 2 weeks prior to the screening decreases time spent sedentary each day by 33.431 units. This revelation defies our hypothesis. Thus, it cannot be claimed that increased time each day spent doing vigorous physical activity correlates to better mental health, and it can also not be claimed that more time spent sedentary will have a positive correlation with poor mental health.

Limitations and Discrepancies

Although the original data represent the national population and are recent enough to assume any relationships found between variables are still applicable, there are several limitations to the data.

The limitations are as follows:

  • Since the data are primarily based on interviews, the responses may not be completely accurate. We can assume there will be issues with people adequately recalling whether or not they experienced a feeling “not at all”, “several days”, “more than half the days”, and “nearly every day”.
  • Interviews can be limiting when it comes to defining physical activity. Unless patients were properly monitoring their physical and sedentary activity, we can expect some inaccuracies in the data reported.
  • Missing data could be affecting our results. The missing data stem from people refusing to answer, those who did not know their answer, NA’s, and general missing data. From the original 5,000 participants, we were left with 1,271 observations which means we are unsure if this sample is truly representative of the national population.
  • A majority of the relationships found between variables were minimal and are statistically insignificant.

These limitations could have significant effects that could skew our analysis, especially considering that some relationships were barely statistically significant or insignificant.

Other limitations include the lack of quantiative variables for meaningful regression analysis. Because only 2 of our variables were truly quantitative (minsPA and minsSed), many of our predictors were qualitative, which offer less interesting and promising insight.

One major discrepancy in the data to note is that the sample sizes among each category (“low”, “moderate”, “intense”) are not evenly distributed. Thus, comparing results among groups is not yielding as accurate of results as we would achieve with even, large samples in each category.